The Noise Component in Model-based Clustering

نویسنده

Pietro Coretto

چکیده

Model-based cluster analysis is a statistical tool used to investigate groupstructures in data. Finite mixtures of Gaussian distributions are a popular device used to model elliptical shaped clusters. Estim ation of mixtures of Gaussians is usually based on the maximum likelihood method. However, for a wide class of finite mixtures, including Gaussians, maximum likelihood estimates are not robust. This implies th a t a small proportion of outliers in the data could lead to poor estimates and clustering. One way to deal with this is to add a “noise component” , i.e. a mixture component th a t models the outliers. In this thesis we explore this approach based on three contributions. First, Fraley and Raftery (1993) propose a Gaussian mixture model with the addition of a uniform noise component with support on the data range. We generalize this approach by introducing a model, which is a finite mixture of location-scale distributions mixed with a finite number of uniforms supported on disjoint subsets of the data range. We study identifiability and maximum likelihood estimation, and provide a com putational procedure based on the EM algorithm. Second, Hennig (2004) proposed a sort of model in which the noise component is represented by a fixed improper density, which is a constant on the real line. He shows th a t the resulting estimates are robust to extreme outliers. We define a maximum likelihood type estim ator for such a model and study its asymptotic behaviour. We also provide a method for choosing the improper constant density, and a computational procedure based on the EM algorithm. The th ird contribution is an extensive simulation study in which we measure the performance of the previous two methods and certain other robust m ethod ologies proposed in the literature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Accurate Fruits Fault Detection in Agricultural Goods using an Efficient Algorithm

The main purpose of this paper was to introduce an efficient algorithm for fault identification in fruits images. First, input image was de-noised using the combination of Block Matching and 3D filtering (BM3D) and Principle Component Analysis (PCA) model. Afterward, in order to reduce the size of images and increase the execution speed, refined Discrete Cosine Transform (DCT) algorithm was uti...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

The Noise Component in Model-based Clustering

نویسنده

چکیده

منابع مشابه

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Speech enhancement based on hidden Markov model using sparse code shrinkage

Accurate Fruits Fault Detection in Agricultural Goods using an Efficient Algorithm

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

Bilateral Weighted Fuzzy C-Means Clustering

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

عنوان ژورنال:

اشتراک گذاری